25 research outputs found

    GSLAM: Initialization-robust Monocular Visual SLAM via Global Structure-from-Motion

    Full text link
    Many monocular visual SLAM algorithms are derived from incremental structure-from-motion (SfM) methods. This work proposes a novel monocular SLAM method which integrates recent advances made in global SfM. In particular, we present two main contributions to visual SLAM. First, we solve the visual odometry problem by a novel rank-1 matrix factorization technique which is more robust to the errors in map initialization. Second, we adopt a recent global SfM method for the pose-graph optimization, which leads to a multi-stage linear formulation and enables L1 optimization for better robustness to false loops. The combination of these two approaches generates more robust reconstruction and is significantly faster (4X) than recent state-of-the-art SLAM systems. We also present a new dataset recorded with ground truth camera motion in a Vicon motion capture room, and compare our method to prior systems on it and established benchmark datasets.Comment: 3DV 2017 Project Page: https://frobelbest.github.io/gsla

    Linear Global Translation Estimation with Feature Tracks

    Full text link
    This paper derives a novel linear position constraint for cameras seeing a common scene point, which leads to a direct linear method for global camera translation estimation. Unlike previous solutions, this method deals with collinear camera motion and weak image association at the same time. The final linear formulation does not involve the coordinates of scene points, which makes it efficient even for large scale data. We solve the linear equation based on L1L_1 norm, which makes our system more robust to outliers in essential matrices and feature correspondences. We experiment this method on both sequentially captured images and unordered Internet images. The experiments demonstrate its strength in robustness, accuracy, and efficiency.Comment: Changes: 1. Adopt BMVC2015 style; 2. Combine sections 3 and 5; 3. Move "Evaluation on synthetic data" out to supplementary file; 4. Divide subsection "Evaluation on general data" to subsections "Experiment on sequential data" and "Experiment on unordered Internet data"; 5. Change Fig. 1 and Fig.8; 6. Move Fig. 6 and Fig. 7 to supplementary file; 7 Change some symbols; 8. Correct some typo

    Advanced subspace methods for low/mid-level vision

    Get PDF
    Low- and mid-level vision tasks are fundamental to computer vision. They are important not only in themselves but also for higher-level tasks as cornerstones. Low-level tasks are basically about extracting primitive information, such as edges, textures, and correspondences from images. And mid-level tasks, from the Gestalt psychologists\u27 perspective, are grouping mechanisms on low-level visual information. In particular, inferring the geometric information from images and segmenting an image into object-level regions are two major aspects of mid-level tasks. In this thesis, we make advances in solving real-world low- and mid-level problems using subspace based representations. For monocular visual SLAM, we solve the visual odometry in a rank-1 factorization and solve the pose-graph optimization by linear programming in multi-stage, which are more robust to initialization errors in the local 3D maps and the global pose-graph respectively. For dense 3D reconstruction, which is also a mid-level task, we represent a depth map as a linear combination of several basis depths from an underlying subspace, and learn a convolutional neural network to generate such a basis. To estimate the depth maps as well as the camera poses, we propose a differentiable bundle adjustment layer that optimizes for the depth map and camera poses by minimizing a feature-metric error. The feature-metric error is defined over a feature pyramid, which is learned jointly with the basis generator end-to-end. For broader low-level vision tasks, we also adopt a basis representation, but for a different purpose. Conventionally, a low-level task is formulated as a continuous energy minimization problem, where the objective function contains a data fidelity term and a smoothness regularization term. We replace the regularization term with a learnable subspace constraint and define the objective function only with the data term. This methodology unifies the network structures and the parameters for many low-level vision tasks and even generalizes to unseen tasks, as long as the corresponding data terms can be formulated. In summary, we explore the subspace based methods from manually derived low-rank formulation to learning based subspace minimization, which are conceptually novel compared to the existing methods. To demonstrate the effectiveness of the proposed methods, we conduct extensive experiments for all the involved tasks on public benchmarks as well as our own data. The results show that our methods have achieved comparable or better performance than state-of-the-art methods with better computational efficiency

    Local subspace video stabilization

    No full text
    Video stabilization enhances video quality by stabilizing unstable motion. This paper proposes a new video stabilization method that simultaneously factors and smooths motion trajectories. We model the trajectories with a time-variant local subspace constraint. Every column of the trajectory matrix is factored and smoothed in separate local subspace. This model makes our method more flexible and accurate than subspace video stabilization. In addition, we design a novel outlier detection technique that utilizes the relationship between consecutive local subspaces. Experiments on synthetic data validate the numerical performance of our factorization. Quantitative comparisons on real videos show that our local method is better than subspace video stabilization. Moreover, our stabilized videos are comparable with the public results from some other state-of-the-art methods.EICPCI-S(ISTP)[email protected]; [email protected]

    Joint Stabilization and Direction of 360 Degrees Videos

    No full text
    360◦ video provides an immersive experience for viewers, allowing them to freely explore the world by turning their head. However, creating highquality 360◦ video content can be challenging, as viewers may miss important events by looking in the wrong direction, or they may see things that ruin the immersion, such as stitching artifacts and the film crew. We take advantage of the fact that not all directions are equally likely to be observed; most viewers are more likely to see content located at “true north”, i.e. in front of them, due to ergonomic constraints. We therefore propose 360◦ video direction, where the video is jointly optimized to orient important events to the front of the viewer and visual clutter behind them, while producing smooth camera motion. Unlike traditional video, viewers can still explore the space as desired, but with the knowledge that the most important content is likely to be in front of them. Constraints can be user guided, either added directly on the equirectangular projection or by recording “guidance” viewing directions while watching the video in a VR headset, or automatically computed, such as via visual saliency or forward motion direction. To accomplish this, we propose a new motion estimation technique specifically designed for 360◦ video which outperforms the commonly used 5-point algorithm on wide angle video. We additionally formulate the direction problem as an optimization where a novel parametrization of spherical warping allows us to correct for some degree of parallax effects. We compare our approach to recent methods that address stabilization-only and converting 360◦ video to narrow field-of-view video. Our pipeline can also enable the viewing of wide angle non-360◦ footage in a spherical 360◦ space, giving an immersive “virtual cinema” experience for a wide range of existing content filmed with first-person cameras
    corecore